Digest.pl:
A Do-It-Yourself Archive for 
Subscribers to Mailing List Digests
Around the time of YAPC last year, ActiveState temporarily took all its mailing lists offline, among them the Perl-Win32-Users list, to which I subscribe.  When the lists came back on line, they did not at first have archiving capacity, which I sorely missed.  I set out to construct my own archive and wrote a Perl program which takes daily digest files, strips out extraneous information and posts each message to a text file for each discussion thread.  Later, I realized that I could do this with other mailing lists besides ActiveState's.  The result is digest.pl.
1
Let's take a look at two examples.  Here's the Perl-Win32-Users digest.  Instructions are found at the top, followed by a list of today's topics, terminated by a delimiter pattern.  Individual messages are also separated by delimiters; each consists of a message number, several headers and body.
2
Here's the Perl Beginners mailing list at Yahoo Groups.  Note the similarities in structure:  instructions, today's topics, individual messages separated by delimiters.  If we identify the way each digest records its list of topics, its delimiter and so forth, we can store these characteristics as the values of a hash of arrays.  Digest.pl uses a hash called %my_digests to store these values.
3

4
The user calls the program from the command line with an argument which keys the particular digest to be processed and selects between various options.  By using a log file, the program determines which digests have been previously processed and which are new, and processes each digest as needed.

Each individual message gets a unique ID.  We use a cleaned-up version of the message's subject line as the name of the text file which holds an ongoing discussion thread.  Here's what a thread file from the Perl-Win32-Users list looks like ... and here are the log files and list of all topics.

5
6
So what good is this program?  Well, in the last two weeks I was able to conduct an interesting bit of research about the Perl community itself by using my archive of nearly 4,000 thread files containing over 12,000 postings to two Perl digests.

In a recent article on perl.com., "Turning the Tide on Perl's Attitude Toward Beginners," Casey West wrote that the "Perl community has held tight to a 'zero tolerance' policy for beginners.  He cited the hostile attitude beginners face when inadvertently posting questions that have been asked before, only to be flamed and ordered to go "RTFM."
7
But I wondered whether the problem West described might be found in some parts of the community but not others.  The two Perl mailing lists to which I subscribe are remarkably flame-free, and I have never personally experienced hostility when posting to them.

I hypothesized that a crude measure of a particular mailing list's incivility would be how frequently "RTFM" appears on postings to the list.  With my archives I was able to test this hypothesis.  I wrote a Perl program which located each instance of "RTFM" and extracted the offending line of text.  I then eyeballed the data to eliminate cases where "RTFM" was quoted in the reply to a message.  This left only original uses of "RTFM" in postings to these lists.  Thus, an index of a particular mailing list's incivility.

Here are the results:
8
By our crude measure, the tone of the discussion on these two Perl mailing lists appears remarkably civil.  "RTFM" appears in fewer than one percent of all postings.  Like Casey West, you may feel that other parts of the Perl community are hostile to beginners.  But in certain corners, civility rules.


Jim Keenan
Brooklyn, NY
jkeen@concentric.net
June 9, 2001

References:
James E. Keenan, "Digest.pl:  A Do-It-Yourself Archive for Subscribers to Mailing List Digests," http://www.concentric.net/~Jkeen/digest/digest.zip

Casey West, "Turning the Tide on Perl's Attitude Toward Beginners," http://www.perl.com/pub/2001/05/29/tides.html
 
	4

1

